<html>
<head>
<link rel="shortcut icon" href="./favicon.ico">
<link rel="stylesheet" type="text/css" href="./style.css">
<link rel="canonical" href="./Pipeline_Credit_Buffer.html">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="Pipelines the control and data of a ready/valid hanshake with zero or more plain [Registers](./Register.html) and a [FIFO](./Pipeline_FIFO_Buffer.html) to control the propagation delay and increase the possible clock frequency.  The latency from input to output is `PIPE_DEPTH` cycles, plus 2 from the output FIFO.">
<title>Pipeline Credit Buffer</title>
</head>
<body>

<p class="inline bordered"><b><a href="./Pipeline_Credit_Buffer.v">Source</a></b></p>
<p class="inline bordered"><b><a href="./legal.html">License</a></b></p>
<p class="inline bordered"><b><a href="./index.html">Index</a></b></p>

<h1>Pipeline Credit Buffer</h1>
<p>Pipelines the control and data of a ready/valid hanshake with zero or more
 plain <a href="./Register.html">Registers</a> and
 a <a href="./Pipeline_FIFO_Buffer.html">FIFO</a> to control the propagation delay and
 increase the possible clock frequency.  The latency from input to output is
 <code>PIPE_DEPTH</code> cycles, plus 2 from the output FIFO.</p>
<p>In a nutshell, a credit buffer has an input interface which knows the
 latency of the register pipeline, tracks how many data transfers have been
 sent from the input interface to the output interface (which subtracts from
 the credit count), tracks how many transfers have left the output interface
 (which adds to the credit count), and thus how much capacity (credit)
 remains in the output interface buffer (minus any data in-flight in the
 pipeline). With that information, we can modulate when the input interface
 can accept new transfers. The FIFO at the output interface is necessary to
 receive any transfers still in-flight in the register pipelines while the
 input interface does not accept new transfers (credit is zero).</p>
<p>Unlike a <a href="./Skid_Buffer_Pipeline.html">Skid Buffer Pipeline</a>, a Credit
 Buffer Pipeline can also improve concurrency by absorbing any
 irregularities in the transfer rates of the input and output interfaces: if
 one interface stalls, the other interface will not see that stall for as
 long as there is either enough available data or space in the FIFO.</p>
<p>For longer pipelines, a Credit Buffer Pipeline uses less hardware than
 a Skid Buffer Pipeline, especially if the FIFO fits inside a denser Block
 RAM, and may fit some underlying devices better (e.g.: the routing
 hyper-registers in high-end Intel FPGAs). For reference, see Abbas and Betz
 <a href="./reading.html#elastic">"Latency Insensitive Design Styles for FPGAs"</a>
 (FPL, 2018).</p>
<h2>Configuration</h2>
<p>The <code>PIPE_DEPTH</code> parameter specifies the number of pipeline registers on
 the <code>input_ready</code> and <code>input_valid/input_data</code> paths. Adjust <code>PIPE_DEPTH</code>
 to break any critical paths between ready/valid interfaces. A value of zero
 will not implement any pipeline registers and reduces the design to a very
 short FIFO (see below for minimum size calculation).  </p>
<p>The <code>FIFO_DEPTH</code> parameter specifies the depth of the FIFO buffer before
 the output port.  The minimum FIFO depth for correct operation and to
 support full throughput is the sum of:</p>
<ul>
<li><code>PIPE_DEPTH</code>: enough for the FIFO to completely receive the contents of
 the pipeline registers. </li>
<li><code>PIPE_DEPTH</code> again: to allow for the credits from completed output
 handshakes to reach the input interface before credits run out.</li>
<li>2 cycles to account for the input/output FIFO latency.</li>
<li>1 cycle to account for updating the credit counter.</li>
</ul>
<p>If the given <code>FIFO_DEPTH</code> parameter is smaller than this sum, it is
 automatically adjusted internally to this minimum sum. You can also specify
 a <code>FIFO_DEPTH</code> <em>larger</em> than this sum to allow the FIFO to absorb stalls on
 the input and output interfaces of duration less than or equal to the extra
 FIFO depth, which will improve concurrency.  Adjust the <code>FIFO_RAMSTYLE</code>
 parameter to match your desired FIFO implementation and your target device.</p>
<h2>Reset</h2>
<p>The pipeline registers do not have enable or reset logic: they are plain,
 always-on registers, which minimises wiring and logic, and matches the
 Intel hyper-register implementation. Thus, <strong>you must hold <code>clear</code> active
 for <code>PIPE_DEPTH</code>+1 cycles at a minimum to make sure all in-flight data and
 control signals have been flushed</strong>, else you might release <code>clear</code> and
 start processing stale data/control still in the pipeline, or place the
 entire pipeline in an inconsistent state resulting in data loss at a much
 later time.</p>
<h2>Ports and Parameters</h2>

<pre>
`default_nettype none

module <a href="./Pipeline_Credit_Buffer.html">Pipeline_Credit_Buffer</a>
#(
    parameter WORD_WIDTH    = 0,
    parameter PIPE_DEPTH    = 0,
    parameter FIFO_DEPTH    = 0,
    parameter FIFO_RAMSTYLE = ""
)
(
    input   wire                        clock,
    input   wire                        clear,

    input   wire                        input_valid,
    output  reg                         input_ready,
    input   wire    [WORD_WIDTH-1:0]    input_data,

    output  wire                        output_valid,
    input   wire                        output_ready,
    output  wire    [WORD_WIDTH-1:0]    output_data
);

    initial begin
        input_ready = 1'b1; // Empty at start, so accept data.
    end
</pre>

<p>We need a minimum of 2x the pipeline latency in FIFO space to account for
 a full round-trip of the control signals, plus any other cycles of latency,
 so we don't stall prematurely or lose data.</p>

<pre>
    `include "<a href="./max_function.html">max_function</a>.vh"

    localparam COUNTER_LATENCY_CYCLES   = 1;
    localparam FIFO_LATENCY_CYCLES      = 2;
    localparam FIFO_DEPTH_MINIMUM       = (2 * PIPE_DEPTH) + FIFO_LATENCY_CYCLES + COUNTER_LATENCY_CYCLES;
    localparam FIFO_DEPTH_ADJUSTED      = max(FIFO_DEPTH, FIFO_DEPTH_MINIMUM); 
</pre>

<p>As always, in a ready/valid handshake interface, data transfers and state
 changes can only happen when both ready and valid are asserted (the
 handshake completes). So we calculate those conditions for both interfaces
 here.</p>

<pre>
    reg input_handshake_done  = 1'b0;
    reg output_handshake_done = 1'b0;

    always @(*) begin
        input_handshake_done =  (input_valid  == 1'b1) && (input_ready  == 1'b1);
        output_handshake_done = (output_valid == 1'b1) && (output_ready == 1'b1);
    end
</pre>

<h2>Input Interface</h2>
<p>The input interface is modulated by the credit counter: as long as we have
 send credits, we signal that we are ready to accept data.  Then we send off
 the data and valid pulses, and receive the ready pulses, both through
 simple register pipelines to reduce cycle time.</p>
<p>The send credit counter starts by assuming everything is empty, so it has
 <code>FIFO_DEPTH_ADJUSTED</code> credits available to use.  We add one to the counter
 width to guarantee we can always contain the <code>FIFO_DEPTH_ADJUSTED</code> count
 plus the extra state of zero (no credits).</p>

<pre>
    `include "<a href="./clog2_function.html">clog2_function</a>.vh"

    localparam CREDIT_COUNTER_INITIAL   = FIFO_DEPTH_ADJUSTED;
    localparam CREDIT_COUNTER_WIDTH     = clog2(CREDIT_COUNTER_INITIAL) + 1;
    localparam CREDIT_COUNTER_ZERO      = {CREDIT_COUNTER_WIDTH{1'b0}};   
    localparam CREDIT_COUNTER_ONE       = {{CREDIT_COUNTER_WIDTH-1{1'b0}},1'b1};   

    reg                             credit_up_down = 1'b0; // 0/1 --> up/down
    reg                             credit_update  = 1'b0;
    wire [CREDIT_COUNTER_WIDTH-1:0] credit_available;

    <a href="./Counter_Binary.html">Counter_Binary</a>
    #(
        .WORD_WIDTH     (CREDIT_COUNTER_WIDTH),
        .INCREMENT      (CREDIT_COUNTER_ONE),
        .INITIAL_COUNT  (CREDIT_COUNTER_INITIAL [CREDIT_COUNTER_WIDTH-1:0])
    )
    send_credits
    (
        .clock          (clock),
        .clear          (clear),

        .up_down        (credit_up_down), // 0/1 --> up/down
        .run            (credit_update),

        .load           (1'b0),
        .load_count     (CREDIT_COUNTER_ZERO),

        .carry_in       (1'b0),
        // verilator lint_off PINCONNECTEMPTY
        .carry_out      (),
        .carries        (),
        .overflow       (),
        // verilator lint_on  PINCONNECTEMPTY

        .count          (credit_available)
    );
</pre>

<h2>Pipelines</h2>
<p>The pipelines are simple registers with no control logic which are always
 enabled and cannot be cleared: if it was possible to control them from the
 input/output interfaces and meet timing, we wouldn't need a pipeline!</p>
<p>First, we continuously pipeline the data from the input interface to the
 output interface.</p>

<pre>
    wire [WORD_WIDTH-1:0] input_data_pipelined;

    <a href="./Register_Pipeline_Simple.html">Register_Pipeline_Simple</a>
    #(
        .WORD_WIDTH     (WORD_WIDTH),
        .PIPE_DEPTH     (PIPE_DEPTH)
    )
    input_data_pipe
    (
        // If PIPE_DEPTH is zero, these are unused
        // verilator lint_off UNUSED
        .clock          (clock),
        .clock_enable   (1'b1),
        .clear          (1'b0),
        // verilator lint_on  UNUSED
        .pipe_in        (input_data),
        .pipe_out       (input_data_pipelined)
    );
</pre>

<p>Then, in parallel, we continuously pipeline a bit which indicates that the
 pipelined input data in the current cycle comes from a completed input
 handshake and is thus valid.</p>

<pre>
    wire input_handshake_done_pipelined;

    <a href="./Register_Pipeline_Simple.html">Register_Pipeline_Simple</a>
    #(
        .WORD_WIDTH     (1),
        .PIPE_DEPTH     (PIPE_DEPTH)
    )
    input_handshake_pipe
    (
        // If PIPE_DEPTH is zero, these are unused
        // verilator lint_off UNUSED
        .clock          (clock),
        .clock_enable   (1'b1),
        .clear          (1'b0),
        // verilator lint_on  UNUSED
        .pipe_in        (input_handshake_done),
        .pipe_out       (input_handshake_done_pipelined)
    );
</pre>

<p>Similarly and in parallel we continuously pipeline, from the output interface
 back to the input interface, a bit which indicates that an output handshake
 has completed in the current cycle and thus a space has been freed in the FIFO.</p>

<pre>
    wire output_handshake_done_pipelined;

    <a href="./Register_Pipeline_Simple.html">Register_Pipeline_Simple</a>
    #(
        .WORD_WIDTH     (1'b1),
        .PIPE_DEPTH     (PIPE_DEPTH)
    )
    output_handshake_pipe
    (
        // If PIPE_DEPTH is zero, these are unused
        // verilator lint_off UNUSED
        .clock          (clock),
        .clock_enable   (1'b1),
        .clear          (1'b0),
        // verilator lint_on  UNUSED
        .pipe_in        (output_handshake_done),
        .pipe_out       (output_handshake_done_pipelined)
    );
</pre>

<h2>Output Interface</h2>
<p>Finally, we store any valid pipelined input data into a FIFO until it can
 be read out. The FIFO itself is the output interface.</p>

<pre>
    <a href="./Pipeline_FIFO_Buffer.html">Pipeline_FIFO_Buffer</a>
    #(
        .WORD_WIDTH     (WORD_WIDTH),
        .DEPTH          (FIFO_DEPTH_ADJUSTED),
        .RAMSTYLE       (FIFO_RAMSTYLE)
    )
    output_storage
    (
        .clock          (clock),
        .clear          (clear),

        .input_valid    (input_handshake_done_pipelined),
        // verilator lint_off PINCONNECTEMPTY
        .input_ready    (),
        // verilator lint_on  PINCONNECTEMPTY
        .input_data     (input_data_pipelined),

        .output_valid   (output_valid),
        .output_ready   (output_ready),
        .output_data    (output_data)
    );
</pre>

<h2>Control Logic</h2>
<p>The control logic for the <code>send_credits</code> counter is simple: at each cycle, if
 the input handshake completes, decrement the credit counter by one, if the
 output handshake completes, increment the credit counter by one, if both
 complete at the same time or no handshake completes this cycle, the counter
 stays constant.  So long as the send credit count is not zero, we can
 accept input data and send it through the pipeline.</p>

<pre>
    always @(*) begin
        credit_up_down  = (output_handshake_done_pipelined == 1'b0); // 0/1 --> up/down
        credit_update   = (output_handshake_done_pipelined != input_handshake_done);
        input_ready     = (credit_available != CREDIT_COUNTER_ZERO);
    end

endmodule
</pre>

<hr>
<p><a href="./index.html">Back to FPGA Design Elements</a>
<center><a href="https://fpgacpu.ca/">fpgacpu.ca</a></center>
</body>
</html>

